Schema Extraction and Structural Outlier Detection for JSON-based NoSQL Data Stores

نویسندگان

  • Meike Klettke
  • Uta Störl
  • Stefanie Scherzinger
چکیده

Although most NoSQL Data Stores are schema-less, information on the structural properties of the persisted data is nevertheless essential during application development. Otherwise, accessing the data becomes simply impractical. In this paper, we introduce an algorithm for schema extraction that is operating outside of the NoSQL data store. Our method is specifically targeted at semi-structured data persisted in NoSQL stores, e.g., in JSON format. Rather than designing the schema up front, extracting a schema in hindsight can be seen as a reverse-engineering step. Based on the extracted schema information, we propose set of similarity measures that capture the degree of heterogeneity of JSON data and which reveal structural outliers in the data. We evaluate our implementation on two real-life datasets: a database from the Wendelstein 7-X project and Web Performance Data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Mining from Document-append Nosql

Due to the unstructured nature of modern digital data, NoSQL storages have been adopted by some enterprises as the preferred storage facility. NoSQL storages can store schema-oriented, semi-structured, schema-less data. A type of NoSQL storage is the document-append storage (e.g., CouchDB and Mongo) which has received high adoption due to its flexibility to store JSON-based data and files as at...

متن کامل

Cleager: Eager Schema Evolution in NoSQL Document Stores

Schema-less NoSQL data stores offer great flexibility in application development, particularly in the early stages of software design. Yet over time, software engineers struggle with the heavy burden of dealing with increasingly heterogeneous data. In this demo we present Cleager, a framework for eagerly managing schema evolution in schema-less NoSQL document stores. Cleager executes declarativ...

متن کامل

Multiterm Keyword Searching For Key Value Based NoSQL System

Today, the enterprise landscape faces large amount of data. The information gathered from these data sources are useful for improving on product and services delivery. However, it is challenging to perform searching activities on these data sources because of its unstructured nature Due to unstructured nature of these data, NoSQL storage has been adapted by many enterprises because it provides ...

متن کامل

An Approach of SQL to JSON Transformation For Handling Database Operations

Nowadays NOSQL databases are becoming more popular. Companies like Google, Facebook, and Amazon has created their own NOSQL databases based on their requirements. Different types of querying approaches are followed by different NOSQL databases, whereas traditional databases like MySQL, ORACLE, etc. follows SQL for querying. Most of the companies are shifting from traditional databases to NOSQL ...

متن کامل

Temporal JSON Schema Versioning in the JSchema Framework

Nowadays, NoSQL databases [1-3] are being used in several emerging application fields (including social network management, e-health and e-government systems, big-science projects) to store data with unconventional types, and specifically big data [4-6]. Furthermore, those are live application domains where the evolution of the stored NoSQL data and of their structure has to keep the pace of th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015